100 research outputs found

    Language modeling and transcription of the TED corpus lectures

    Get PDF
    Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved

    Optimized MT Online Learning in Computer Assisted Translation

    Get PDF
    In this paper we propose a cascading framework for optimizing online learning in machine translation for computer assisted translation scenario. With the use of online learning, one introduces several hyper parameters associated with the learning algorithm. Number of iterations of online learning can affect the quality of translation as well. We discuss these issues and propose a few approaches that can be used to optimize the hyper parameters and also to find the number of iterations required for online learning. We experimentally show that using optimal number of iterations in online learning proves to be useful and we get consistent improvement against baseline results

    Adattamento al Progetto dei Modelli di Traduzione Automatica nella Traduzione Assistita

    Get PDF
    L'integrazione della traduzione automatica nei sistemi di traduzione assistita è una sfida sia per la ricerca accademica sia per quella industriale. Infatti, i traduttori professionisti percepiscono come cruciale l'abilità dei sistemi automatici di adattarsi al loro stile e alle loro correzioni. In questo articolo proponiamo uno schema di adattamento dei sistemi di traduzione automatica ad uno specifico documento sulla base di una limitata quantità di testo, corretto manualmente, pari a quella prodotta giornalmente da un singolo traduttore

    Bootstrapping Arabic-Italian SMT through Comparable Texts and Pivot Translation

    Get PDF
    This paper describes efforts towards the development of an Arabic to Italian SMT system for the news domain. Since only very little parallel data are available for this language pair, we investigated both the exploitation of comparable corpora and pivot translation. Experimental evaluation was conducted on a new benchmark developed by extending two Arabic-to-English NIST evaluation sets with Italian and French translations, produced from the source language by experts. Preliminary results show potentials of both approaches with respect to performance achieved by a popular state-of-the-art Web-based translation service

    Project Adaptation for MT-Enhanced Computer Assisted Translation

    Get PDF
    The effective integration of MT technology into CAT tools is a challenging topic both for academic research and the translation industry. Particularly, professional translators feel crucial the ability of MT systems to adapt to their feedback. In this paper, we propose an adaptation scheme to tune a statistical MT system to a translation project using small amounts of post-edited texts. By running field tests on two domains with 8 professional translators working with a CAT tool, productivity gains up to over 20% were measured after applying MT project adaptation

    Overview of the IWSLT 2017 Evaluation Campaign

    Get PDF
    The IWSLT 2017 evaluation campaign has organised three tasks. The Multilingual task, which is about training machine translation systems handling many-to-many language directions, including so-called zero-shot directions. The Dialogue task, which calls for the integration of context information in machine translation, in order to resolve anaphoric references that typically occur in human-human dialogue turns. And, finally, the Lecture task, which offers the challenge of automatically transcribing and translating real-life university lectures. Following the tradition of these reports, we will described all tasks in detail and present the results of all runs submitted by their participants
    • …
    corecore